feat: added client side metric instrumentation to basic rpcs #1188

daniel-sanche · 2025-08-11T18:59:04Z

This PR builds off of #1187 to add instrumentation to basic data client rpcs (check_and_mutate, read_modify_write, sample_row_keys, mutate_row)

Metrics are not currently being exported anywhere, just collected and dropped. A future PR will add a GCP exporter to the system

Co-authored-by: Mattie Fu <mattiefu@google.com>

tests/system/data/test_metrics_async.py

mutianf · 2026-01-28T19:05:33Z

tests/system/data/test_metrics_async.py

+        for i in range(num_retryable):
+            attempt = handler.completed_attempts[i]
+            assert isinstance(attempt, CompletedAttemptMetric)
+            assert attempt.end_status.name == "ABORTED"


Not related to the metrics, but ABORTED shouldn't be retried for mutate row?

It doesn't look like it, and it's not listed as a retryable error in the service config. Is it in Java? It would be easy to add here

mutianf · 2026-01-28T19:08:35Z

tests/system/data/test_metrics_async.py

+        assert attempt.end_status.value[0] == 0
+        assert attempt.backoff_before_attempt_ns == 0
+        assert (
+            attempt.gfe_latency_ns > 0 and attempt.gfe_latency_ns < attempt.duration_ns


how is gfe_latency_ns injected to the header? this should be a number instead of a range since we can set it in the header.

I think this is testing against the true backend response here, not a mocked value

These are system tests, so I was trying to limit the amount of mocking used here, although some other tests do inject fake exceptions into the stream to test retry logic

mutianf · 2026-01-28T19:09:29Z

tests/system/data/test_metrics_async.py

+        final_attempt = handler.completed_attempts[num_retryable]
+        assert isinstance(final_attempt, CompletedAttemptMetric)
+        assert final_attempt.end_status.name == "PERMISSION_DENIED"
+        assert final_attempt.gfe_latency_ns is None


I think as long as the request gets to the server, gfe_latency_ns should not be none. So this is probably related to the test setup ?

It looks like this test is testing the case where we have multiple transient retryable errors, before encountering a terminal error. For the errors, I'm using a custom error_injector class, to control which errors are triggered in which sequence. So this test wouldn't ever be reaching the real backend

It's been a while since I looked at this, but I remember having issues fully controlling the backend errors in the way I needed for some of these tests, which is why I used the error_injector. There are other tests in this file that send unauthorized requests to trigger a real backend failure though.

daniel-sanche and others added 30 commits July 25, 2025 16:12

use replaceable channel wrapper

d2175f1

got unit tests working

5e107fc

put back in cache invalidation

c4a97e1

added wrapped multicallables to avoid cache invalidation

e71b1d5

added crosssync, moved close logic back to client

b81a9be

generated sync code

a1dffb5

got tests running

e3ec02b

fixed tests

4e13783

remove extra wrapper; added invalidate_stubs helper

7d90a04

fixed lint

26cd601

fixed lint

375332f

renamed replaceablechannel to swappablechannel

428d75a

added tests

4b39bc5

added docstrings

3f090c2

Merge branch 'main' into refactor_refresh

883ceab

initial commit

04c762a

added back interceptor

29dff4d

added metrics to client

e4f8238

fixed lint

fcb062e

Merge branch 'refactor_refresh' into csm_1_data_model

ac8dbe4

set up channel interceptions

d155f8a

added TrackedBackoffGenerator

9fece96

fixed lint

aec2577

fixed import

ec4e847

added stdout handler to test

0d93889

instrumented check_and_mutate

a580fa2

added instrumentation to read_modify_write

bc13b46

added instrumentation to sample_row_keys

6c3be46

instrumented mutate_row

ca38615

added instrumentation to mutate_rows

eb82ae9

daniel-sanche and others added 14 commits November 26, 2025 16:56

added docstring

22eb2e1

fixed type

0ec8d14

import annotations

fa25c2b

Update google/cloud/bigtable/data/_metrics/data_model.py

f9ac548

Co-authored-by: Mattie Fu <mattiefu@google.com>

addressed PR comments

284c8a6

improved state machine

16f7d57

added negative check

d167487

removed uuid

c408e14

fixed lint

a70824b

added cryptography to prerelease deps

4ceab60

updated state diagram

78c640b

Merge branch 'csm_1_data_model' into csm_2_instrumentation

5b8c22b

removed read_rows and mutate_rows instrumentation

d3a9013

reverted unneeded files

83c806a

daniel-sanche changed the title ~~[DRAFT] feat: added client side metric instrumentation to data client~~ [DRAFT] feat: added client side metric instrumentation to basic rpcs Jan 15, 2026

fixed tests

bacc505

daniel-sanche changed the title ~~[DRAFT] feat: added client side metric instrumentation to basic rpcs~~ feat: added client side metric instrumentation to basic rpcs Jan 15, 2026

daniel-sanche marked this pull request as ready for review January 15, 2026 21:13

daniel-sanche requested review from a team as code owners January 15, 2026 21:13

blunderbuss-gcf bot assigned mutianf Jan 15, 2026

fixed lint

ced3ee3

daniel-sanche mentioned this pull request Jan 15, 2026

[DRAFT] feat: added client side metric instrumentation to read_rows and mutate_rows #1256

Draft

daniel-sanche added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jan 15, 2026

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jan 15, 2026

Base automatically changed from csm_1_data_model to main January 22, 2026 01:20

Merge branch 'main' into csm_2_instrumentation

700ebe3

mutianf reviewed Jan 28, 2026

View reviewed changes

updated year

9d3588a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: added client side metric instrumentation to basic rpcs #1188

feat: added client side metric instrumentation to basic rpcs #1188

Uh oh!

daniel-sanche commented Aug 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

mutianf Jan 28, 2026

Uh oh!

daniel-sanche Jan 28, 2026

Uh oh!

mutianf Jan 28, 2026

Uh oh!

daniel-sanche Jan 28, 2026 •

edited

Loading

Uh oh!

mutianf Jan 28, 2026

Uh oh!

daniel-sanche Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: added client side metric instrumentation to basic rpcs #1188

Are you sure you want to change the base?

feat: added client side metric instrumentation to basic rpcs #1188

Uh oh!

Conversation

daniel-sanche commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mutianf Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

daniel-sanche Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

mutianf Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

daniel-sanche Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mutianf Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

daniel-sanche Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

daniel-sanche commented Aug 11, 2025 •

edited

Loading

daniel-sanche Jan 28, 2026 •

edited

Loading